Socrates: identification of genomic rearrangements in tumour genomes by re-aligning soft clipped reads

نویسندگان

  • Jan Schröder
  • Arthur Hsu
  • Samantha E. Boyle
  • Geoff MacIntyre
  • Marek Cmero
  • Richard W. Tothill
  • Ricky W. Johnstone
  • Mark Shackleton
  • Anthony T. Papenfuss
چکیده

MOTIVATION Methods for detecting somatic genome rearrangements in tumours using next-generation sequencing are vital in cancer genomics. Available algorithms use one or more sources of evidence, such as read depth, paired-end reads or split reads to predict structural variants. However, the problem remains challenging due to the significant computational burden and high false-positive or false-negative rates. RESULTS In this article, we present Socrates (SOft Clip re-alignment To idEntify Structural variants), a highly efficient and effective method for detecting genomic rearrangements in tumours that uses only split-read data. Socrates has single-nucleotide resolution, identifies micro-homologies and untemplated sequence at break points, has high sensitivity and high specificity and takes advantage of parallelism for efficient use of resources. We demonstrate using simulated and real data that Socrates performs well compared with a number of existing structural variant detection tools. AVAILABILITY AND IMPLEMENTATION Socrates is released as open source and available from http://bioinf.wehi.edu.au/socrates CONTACT: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

iMapper: a web application for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes

SUMMARY Insertional mutagenesis is a powerful method for gene discovery. To identify the location of insertion sites in the genome linker based polymerase chain reaction (PCR) methods (such as splinkerette-PCR) may be employed. We have developed a web application called iMapper (Insertional Mutagenesis Mapping and Analysis Tool) for the efficient analysis of insertion site sequence reads agains...

متن کامل

Identification of large rearrangements in cancer genomes with barcode linked reads

Large genomic rearrangements involve inversions, deletions and other structural changes that span Megabase segments of the human genome. This category of genetic aberration is the cause of many hereditary genetic disorders and contributes to pathogenesis of diseases like cancer. We developed a new algorithm called ZoomX for analysing barcode-linked sequence reads-these sequences can be traced t...

متن کامل

Genome analysis MetaSV: an accurate and integrative structural-variant caller for next generation sequencing

Summary: Structural variations (SVs) are large genomic rearrangements that vary significantly in size, making them challenging to detect with the relatively short reads from next-generation sequencing (NGS). Different SV detection methods have been developed; however, each is limited to specific kinds of SVs with varying accuracy and resolution. Previous works have attempted to combine differen...

متن کامل

MetaSV: an accurate and integrative structural-variant caller for next generation sequencing

UNLABELLED Structural variations (SVs) are large genomic rearrangements that vary significantly in size, making them challenging to detect with the relatively short reads from next-generation sequencing (NGS). Different SV detection methods have been developed; however, each is limited to specific kinds of SVs with varying accuracy and resolution. Previous works have attempted to combine differ...

متن کامل

Assembly of genomic reads of elite indica rice cultivar onto 2101 reference bacterial genomes for identification of co-sequenced endophytic bacteria

Reference based assembly of genomic reads of the elite indica rice cultivar RP Bio-226 was carried out against 2101 reference bacterial genomes using Bowtie-2 genome assembly tool. Five types of data: Number of paired end reads concordantly aligned exactly only once, number of paired end reads concordantly aligned more than once, number of mates that make the pairs aligned exactly only once, nu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 30  شماره 

صفحات  -

تاریخ انتشار 2014